Asymptotic Average Redundancy of Huffman (and Shannon-Fano) Block Codes
نویسندگان
چکیده
We study the redundancy of Huffman code (which, incidentally, is as old as the author of this paper). It has been known from the inception of this code that in the worst case Huffman code redundancy defined as the excess of the code length over the optimal (ideal) code length is not more than onc. Over more than forty years, insightful, elegant and useful constructions have been set up to determine tighter bounds on the redundancy. One should mention here Gallager's upper bound of PI +0.086 (PI is the probability of the most likely symbol), and tighter bounds due to Capacelli, de Santis and others. However, to the best of our knowledge no precise asymptotic results have been reported in literature thus far. We consider here a memoryless binary source generating a sequence of length n distributed as binomial(n,p) with p being the probability of emitting O. Based on recent result of Stubley, we prove that for p ¥1/2 the average redundancy Rn of the Huffman code as n -)00 becomes { ~ lJ~2 = 0.057304 ... R" ...... , '(aM) 'l 1 2-{n{JM)/M 2" M i1 n 2" M(l 2 l)M) Ct = log2(1 p)/p irrational Ct = Z rational where M, N are integers such that gcd(N, M) = 1, (x) = x LxJ is the fractional part of x, and [3 = -log2(1 pl. The appearance of the fractal~like function ([3Mn) explains the erratic behavior of the Huffman redundancy, and its "resistance" to succumb to a precise analysis. In fact, from the above we also can recover the Gallager's upper bound. As a side result, we prove that the average redundancy of the Shannon-Fano code is { . , R" ...... ~ 11 ((Mn[3) t) Ct = !og2(1 p)/p irrational Ct = Z rational as n -)00. These findings are obtained through analytic methods such as Fourier analysis and theory of distribution of sequences modulo 1. Index Terms Huffman code, Shannon-Fano code, average redundancy, Fourier analysis, distribution of sequences, Weyl's criterion. "This research was supported in part by NSF Grants NCR-9415491 and C-CR-9804760.
منابع مشابه
On the Analysis of Variable-to-Variable Length Codes
We use the \conservation of entropy" 1] to derive a simple formula for the redundancy of a large class of variable-to-variable length codes on discrete, memoryless sources. We obtain new asymptotic upper bounds on the redundancy of the \Tunstall-Huuman" code and the \Tunstall-Shannon-Fano" code. For some sources we provide the best existing upper bound for the smallest achievable asymptotic red...
متن کاملOn the Analysis of Variable - to - Variable Length
We use the \conservation of entropy" 1] to simplify the formula for the redundancy of a large class of variable-to-variable length codes on discrete, memoryless sources. This result leads to new asymptotic upper bounds on the redundancy of the \Tunstall-Huuman" code and the \Tunstall-Shannon-Fano" code.
متن کاملUsing an innovative coding algorithm for data encryption∗
This paper discusses the problem of using data compression for encryption. We first propose an algorithm for breaking a prefix-coded file by enumeration. Based on the algorithm, we respectively analyze the complexity of breaking Huffman codes and Shannon-Fano-Elias codes under the assumption that the cryptanalyst knows the code construction rule and the probability mass function of the source. ...
متن کاملThe Rényi redundancy of generalized Huffman codes
If optimality is measured by average codeword length, Huffman's algorithm gives optimal codes, and the redundancy can be measured as the difference between the average codeword length and Shannon's entropy. If the objective function is replaced by an exponentially weighted average, then a simple modification of Huffman's algorithm gives optimal codes. The redundancy can now be measured as the d...
متن کاملRedundancy-Related Bounds on Generalized Huffman Codes
This paper presents new lower and upper bounds for the compression rate of optimal binary prefix codes on memoryless sources according to various nonlinear codeword length objectives. Like the most well-known redundancy bounds for minimum (arithmetic) average redundancy coding — Huffman coding — these are in terms of a form of entropy and/or the probability of the most probable input symbol. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013